{
 "metadata": {
  "name": "",
  "signature": "sha256:57a52346b3b6c058c1a5f9a8259a4be6c3077b86bc79b934c1fdf02a73fca1df"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Implement estimators of large-scale sparse Gaussian densities"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "by Soumyajit De (email: heavensdevil6909@gmail.com, soumyajitde@cse.iitb.ac.in. Github: <a href=\"https://github.com/lambday\">lambday</a>)<br/>\n",
      "Many many thanks to my mentor Heiko Strathmann, Sergey Lisitsyn, S\u00f6ren Sonnenburg, Viktor Gal"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This notebook illustrates large-scale sparse [Gaussian density](http://en.wikipedia.org/wiki/Normal_distribution) [likelihood](http://en.wikipedia.org/wiki/Likelihood_function) estimation. It first introduces the reader to the mathematical background and then shows how one can do the estimation with Shogun on a number of real-world data sets."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h2>Theoretical introduction</h2>\n",
      "<p><i>Multivariate Gaussian distributions</i>, i.e. some random vector  $\\mathbf{x}\\in\\mathbb{R}^n$ having probability density function\n",
      "$$p(\\mathbf{x}|\\boldsymbol\\mu, \\boldsymbol\\Sigma)=(2\\pi)^{-n/2}\\text{det}(\\boldsymbol\\Sigma)^{-1/2} \\exp\\left(-\\frac{1}{2}(\\mathbf{x}-\\boldsymbol\\mu)^{T}\\boldsymbol\\Sigma^{-1}(\\mathbf{x}-\\boldsymbol\\mu)\\right)$$\n",
      "$\\boldsymbol\\mu$ being the mean vector and $\\boldsymbol\\Sigma$ being the covariance matrix, arise in numerous occassions involving large datasets. Computing <i>log-likelihood</i> in these requires computation of the log-determinant of the covariance matrix\n",
      "$$\\mathcal{L}(\\mathbf{x}|\\boldsymbol\\mu,\\boldsymbol\\Sigma)=-\\frac{n}{2}\\log(2\\pi)-\\frac{1}{2}\\log(\\text{det}(\\boldsymbol\\Sigma))-\\frac{1}{2}(\\mathbf{x}-\\boldsymbol\\mu)^{T}\\boldsymbol\\Sigma^{-1}(\\mathbf{x}-\\boldsymbol\\mu)$$\n",
      "The covariance matrix and its inverse are symmetric positive definite (spd) and are often sparse, e.g. due to conditional independence properties of Gaussian Markov Random Fields (GMRF). Therefore they can be stored efficiently even for large dimension $n$.</p>\n",
      "\n",
      "<p>The usual technique for computing the log-determinant term in the likelihood expression relies on <i><a href=\"http://en.wikipedia.org/wiki/Cholesky_factorization\">Cholesky factorization</a></i> of the matrix, i.e. $\\boldsymbol\\Sigma=\\mathbf{LL}^{T}$, ($\\mathbf{L}$ is the lower triangular Cholesky factor) and then using the diagonal entries of the factor to compute $\\log(\\text{det}(\\boldsymbol\\Sigma))=2\\sum_{i=1}^{n}\\log(\\mathbf{L}_{ii})$. However, for sparse matrices, as covariance matrices usually are, the Cholesky factors often suffer from <i>fill-in</i> phenomena - they turn out to be not so sparse themselves. Therefore, for large dimensions this technique becomes infeasible because of a massive memory requirement for storing all these irrelevant non-diagonal co-efficients of the factor. While ordering techniques have been developed to permute the rows and columns beforehand in order to reduce fill-in, e.g. <i><a href=\"http://en.wikipedia.org/wiki/Minimum_degree_algorithm\">approximate minimum degree</a></i> (AMD) reordering, these techniques depend largely on the sparsity pattern and therefore not guaranteed to give better result.</p>\n",
      "\n",
      "<p>Recent research shows that using a number of techniques from complex analysis, numerical linear algebra and greedy graph coloring, we can, however, approximate the log-determinant up to an arbitrary precision [<a href=\"http://link.springer.com/article/10.1007%2Fs11222-012-9368-y\">Aune et. al., 2012</a>]. The main trick lies within the observation that we can write $\\log(\\text{det}(\\boldsymbol\\Sigma))$ as $\\text{trace}(\\log(\\boldsymbol\\Sigma))$, where $\\log(\\boldsymbol\\Sigma)$ is the matrix-logarithm. Computing the log-determinant then requires extracting the trace of the matrix-logarithm as\n",
      "$$\\text{trace}(\\log(\\boldsymbol\\Sigma))=\\sum_{j=1}^{n}\\mathbf{e}^{T}_{j}\\log(\\boldsymbol\\Sigma)\\mathbf{e}_{j}$$\n",
      "where each $\\mathbf{e}_{j}$ is a unit basis vector having a 1 in its $j^{\\text{th}}$ position while rest are zeros and we assume that we can compute $\\log(\\boldsymbol\\Sigma)\\mathbf{e}_{j}$ (explained later). For large dimension $n$, this approach is still costly, so one needs to rely on sampling the trace. For example, using stochastic vectors we can obtain a <i><a href=\"http://en.wikipedia.org/wiki/Monte_Carlo_method\">Monte Carlo estimator</a></i> for the trace -\n",
      "$$\\text{trace}(\\log(\\boldsymbol\\Sigma))=\\mathbb{E}_{\\mathbf{v}}(\\mathbf{v}^{T}\\log(\\boldsymbol\\Sigma)\\mathbf{v})\\approx \\sum_{j=1}^{k}\\mathbf{s}^{T}_{j}\\log(\\boldsymbol\\Sigma)\\mathbf{s}_{j}$$\n",
      "where the source vectors ($\\mathbf{s}_{j}$) have zero mean and unit variance (e.g. $\\mathbf{s}_{j}\\sim\\mathcal{N}(\\mathbf{0}, \\mathbf{I}), \\forall j\\in[1\\cdots k]$). But since this is a Monte Carlo method, we need many many samples to get sufficiently accurate approximation. However, by a method suggested in Aune et. al., we can reduce the number of samples required drastically by using <i>probing-vectors</i> that are obtained from <a href=\"http://en.wikipedia.org/wiki/Graph_coloring\">coloring of the adjacency graph</a> represented by the power of the sparse-matrix, $\\boldsymbol\\Sigma^{p}$, i.e. we can obtain -\n",
      "$$\\mathbb{E}_{\\mathbf{v}}(\\mathbf{v}^{T}\\log(\\boldsymbol\\Sigma)\\mathbf{v})\\approx \\sum_{j=1}^{m}\\mathbf{w}^{T}_{j}\\log(\\boldsymbol\\Sigma)\\mathbf{w}_{j}$$\n",
      "with $m\\ll n$, where $m$ is the number of colors used in the graph coloring. For a particular color $j$, the probing vector $\\mathbb{w}_{j}$ is obtained by filling with $+1$ or $-1$ uniformly randomly for entries corresponding to nodes of the graph colored with $j$, keeping the rest of the entries as zeros. Since the matrix is sparse, the number of colors used is usually very small compared to the dimension $n$, promising the advantage of this approach.</p>\n",
      "\n",
      "<p>There are two main issues in this technique. First, computing $\\boldsymbol\\Sigma^{p}$ is computationally costly, but experiments show that directly applying a <i>d-distance</i> coloring algorithm on the sparse matrix itself also results in a pretty good approximation. Second, computing the exact matrix-logarithm is often infeasible because its is not guaranteed to be sparse. Aune et. al. suggested that we can rely on rational approximation of the matrix-logarithm times vector using an approach described in <a href=\"http://eprints.ma.man.ac.uk/1136/01/covered/MIMS_ep2007_103.pdf\">Hale et. al [2008]</a>, i.e. writing $\\log(\\boldsymbol\\Sigma)\\mathbf{w}_{j}$ in our desired expression using <i><a href=\"http://en.wikipedia.org/wiki/Cauchy's_integral_formula\">Cauchy's integral formula</a></i> as -\n",
      "$$log(\\boldsymbol\\Sigma)\\mathbf{w}_{j}=\\frac{1}{2\\pi i}\\oint_{\\Gamma}log(z)(z\\mathbf{I}-\\boldsymbol\\Sigma)^{-1}\\mathbf{w}_{j}dz\\approx \\frac{-8K(\\lambda_{m}\\lambda_{M})^{\\frac{1}{4}}}{k\\pi N} \\boldsymbol\\Sigma\\Im\\left(-\\sum_{l=1}^{N}\\alpha_{l}(\\boldsymbol\\Sigma-\\sigma_{l}\\mathbf{I})^{-1}\\mathbf{w}_{j}\\right)$$\n",
      "$K$, $k \\in \\mathbb{R}$,  $\\alpha_{l}$, $\\sigma_{l} \\in \\mathbb{C}$ are coming from <i><a href=\"http://en.wikipedia.org/wiki/Jacobi_elliptic_functions\">Jacobi elliptic functions</a></i>, $\\lambda_{m}$ and $\\lambda_{M}$ are the minimum/maximum eigenvalues of $\\boldsymbol\\Sigma$ (they have to be real-positive), respectively, $N$ is the number of contour points in the quadrature rule of the above integral and  $\\Im(\\mathbf{x})$ represents the imaginary part of $\\mathbf{x}\\in\\mathbb{C}^{n}$.</p>\n",
      "\n",
      "<p>The problem then finally boils down to solving the shifted family of linear systems $(\\boldsymbol\\Sigma-\\sigma_{l}\\mathbf{I})\\mathbb{x}_{j}=\\mathbb{w}_{j}$. Since $\\boldsymbol\\Sigma$ is sparse, matrix-vector product is not much costly and therefore these systems can be solved with a low memory-requirement using <i>Krylov subspace iterative solvers</i> like <i><a href=\"http://en.wikipedia.org/wiki/Conjugate_gradient_method\">Conjugate Gradient</a></i> (CG). Since the shifted matrices have complex entries along their diagonal, the appropriate method to choose is <i>Conjugate Orthogonal Conjugate Gradient</i> (COCG) [<a href=\"http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=106415&tag=1\">H.A. van der Vorst et. al., 1990.</a>]. Alternatively, these systems can be solved at once using <i>CG-M</i> [<a href\"http://arxiv.org/abs/hep-lat/9612014\">Jegerlehner, 1996.</a>] solver which solves for $(\\mathbf{A}+\\sigma\\mathbf{I})\\mathbf{x}=\\mathbf{b}$ for all values of $\\sigma$ using as many matrix-vector products in the CG-iterations as required to solve for one single shifted system. This algorithm shows reliable convergance behavior for systems with reasonable condition number.</p>\n",
      "\n",
      "<p>One interesting property of this approach is that once the graph coloring information and shifts/weights are known, all the computation components - solving linear systems, computing final vector-vector product - are independently computable. Therefore, computation can be speeded up using parallel computation of these. To use this, a computation framework for Shogun is developed and the whole log-det computation works on top of it.</p>\n",
      "\n",
      "<h2>An example of using this approach in Shogun</h2>\n",
      "<p>We demonstrate the usage of this technique to estimate log-determinant of a real-valued spd sparse matrix with dimension $715,176\\times 715,176$ with $4,817,870$ non-zero entries, <a href=\"http://www.cise.ufl.edu/research/sparse/matrices/GHS_psdef/apache2.html\">apache2</a>, which is obtained from the <a href=\"http://www.cise.ufl.edu/research/sparse/matrices/\">The University of Florida Sparse Matrix Collection</a>. Cholesky factorization with AMD for this sparse-matrix gives rise to factors with $353,843,716$ non-zero entries (from source). We use CG-M solver to solve the shifted systems which is then used with <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CSerialComputationEngine.html\">SerialComputationEngine</a> to perform the computation jobs sequentially on a single core, that works even on a normal Desktop machine. Since the original matrix is badly conditioned, here we added a ridge along its diagonal to reduce the condition number so that the CG-M solver converges within reasonable time. Please note that for high condition number, the number of iteration has to be set very high."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%matplotlib inline\n",
      "from scipy.sparse import eye\n",
      "from scipy.io import mmread\n",
      "from matplotlib import pyplot as plt\n",
      "\n",
      "matFile='../../../data/logdet/apache2.mtx.gz'\n",
      "M = mmread(matFile)\n",
      "rows = M.shape[0]\n",
      "cols = M.shape[1]\n",
      "A = M + eye(rows, cols) * 10000.0\n",
      "plt.title(\"A\")\n",
      "plt.spy(A, precision = 1e-2, marker = '.', markersize = 0.01)\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First, to keep the notion of Krylov subspace, we view the matrix as a linear operator that applies on a vector, resulting a new vector. We use <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CSparseMatrixOperator.html\">RealSparseMatrixOperator</a> that is suitable for this example. All the solvers work with <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLinearOperator.html\">LinearOperator</a> type objects. For computing the eigenvalues, we use <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLanczosEigenSolver.html\">LanczosEigenSolver</a> class. Although computation of the Eigenvalues is done internally within the log-determinant estimator itself (see below), here we explicitely precompute them."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import RealSparseMatrixOperator, LanczosEigenSolver\n",
      "\n",
      "op = RealSparseMatrixOperator(A.tocsc())\n",
      "\n",
      "# Lanczos iterative Eigensolver to compute the min/max Eigenvalues which is required to compute the shifts\n",
      "eigen_solver = LanczosEigenSolver(op)\n",
      "# we set the iteration limit high to compute the eigenvalues more accurately, default iteration limit is 1000\n",
      "eigen_solver.set_max_iteration_limit(2000)\n",
      "\n",
      "# computing the eigenvalues\n",
      "eigen_solver.compute()\n",
      "print 'Minimum Eigenvalue:', eigen_solver.get_min_eigenvalue()\n",
      "print 'Maximum Eigenvalue:', eigen_solver.get_max_eigenvalue()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Next, we use <a href=\"http://www.shogun-toolbox.org/doc/en/latest/ProbingSampler_8h_source.html\">ProbingSampler</a> class which uses an external library <a href=\"http://www.cscapes.org/coloringpage/\">ColPack</a>. Again, the number of colors used is precomputed for demonstration purpose, although computed internally inside the log-determinant estimator."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# We can specify the power of the sparse-matrix that is to be used for coloring, default values will apply a\n",
      "# 2-distance greedy graph coloring algorithm on the sparse-matrix itself. Matrix-power, if specified, is computed in O(lg p)\n",
      "from modshogun import ProbingSampler\n",
      "\n",
      "trace_sampler = ProbingSampler(op)\n",
      "# apply the graph coloring algorithm and generate the number of colors, i.e. number of trace samples\n",
      "trace_sampler.precompute()\n",
      "print 'Number of colors used:', trace_sampler.get_num_samples()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<p>This corresponds to averaging over 13 source vectors rather than one (but has much lower variance as using 13 Gaussian source vectors). A comparison between the convergence behavior of using probing sampler and Gaussian sampler is presented later.</p>\n",
      "\n",
      "<p>Then we define <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLogRationalApproximationCGM.html\">LogRationalApproximationCGM</a> operator function class, which internally uses the Eigensolver to compute the Eigenvalues, uses <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CJacobiEllipticFunctions.html\">JacobiEllipticFunctions</a> to compute the complex shifts, weights and the constant multiplier in the rational approximation expression, takes the probing vector generated by the trace sampler and submits a computation job to the engine which then uses CG-M solver (<a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CCGMShiftedFamilySolver.html\">CGMShiftedFamilySolver</a>) to solve the shifted systems. Precompute is not necessary here too.</p>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import SerialComputationEngine, CGMShiftedFamilySolver, LogRationalApproximationCGM\n",
      "\n",
      "engine = SerialComputationEngine()\n",
      "cgm = CGMShiftedFamilySolver()\n",
      "# setting the iteration limit (set this to higher value for higher condition number)\n",
      "cgm.set_iteration_limit(100)\n",
      "\n",
      "# accuracy determines the number of contour points in the rational approximation (i.e. number of shifts in the systems)\n",
      "accuracy = 1E-15\n",
      "\n",
      "# we create a operator-log-function using the sparse matrix operator that uses CG-M to solve the shifted systems\n",
      "op_func = LogRationalApproximationCGM(op, engine, eigen_solver, cgm, accuracy)\n",
      "op_func.precompute()\n",
      "print 'Number of shifts:', op_func.get_num_shifts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Finally, we use the <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLogDetEstimator.html\">LogDetEstimator</a> class to sample the log-determinant of the matrix."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import LogDetEstimator\n",
      "\n",
      "# number of log-det samples (use a higher number to get better estimates)\n",
      "# (this is 5 times number of colors estimate in practice, so usually 1 probing estimate is enough)\n",
      "num_samples = 5\n",
      "\n",
      "log_det_estimator = LogDetEstimator(trace_sampler, op_func, engine)\n",
      "estimates = log_det_estimator.sample(num_samples)\n",
      "\n",
      "estimated_logdet = mean(estimates)\n",
      "print 'Estimated log(det(A)):', estimated_logdet"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To verify the accuracy of the estimate, we compute exact log-determinant of A using Cholesky factorization using <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CStatistics.html#a9931a4ea72310b239efdc05503442525\">Statistics::log_det</a> method."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# the following method requires massive amount of memory, for demonstration purpose\n",
      "# the following code is commented out and direct value obtained from running it once is used\n",
      "\n",
      "# from modshogun import Statistics\n",
      "# actual_logdet = Statistics.log_det(A)\n",
      "\n",
      "actual_logdet = 7120357.73878\n",
      "print 'Actual log(det(A)):', actual_logdet\n",
      "\n",
      "plt.hist(estimates)\n",
      "plt.plot([actual_logdet, actual_logdet], [0,len(estimates)], linewidth=3)\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h2>Statistics</h2>\n",
      "We use a smaller sparse-matrix, <a href=\"http://www.cise.ufl.edu/research/sparse/matrices/HB/west0479.html\">'west0479'</a> in this section to demonstrate the benefits of using probing vectors over standard Gaussian vectors to sample the trace of matrix-logarithm. In the following we can easily observe the fill-in phenomena described earlier. Again, a ridge has been added to reduce the runtime for demonstration purpose."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.sparse import csc_matrix\n",
      "\n",
      "m = mmread('../../../data/logdet/west0479.mtx')\n",
      "# computing a spd with added ridge\n",
      "B = csc_matrix(m.transpose() * m + identity(m.shape[0]) * 1000.0)\n",
      "\n",
      "fig = plt.figure(figsize=(12, 4))\n",
      "ax = fig.add_subplot(1,2,1)\n",
      "ax.set_title('B')\n",
      "ax.spy(B, precision = 1e-5, marker = '.', markersize = 2.0)\n",
      "\n",
      "ax = fig.add_subplot(1,2,2)\n",
      "ax.set_title('lower Cholesky factor')\n",
      "dense_matrix = B.todense()\n",
      "L = cholesky(dense_matrix)\n",
      "ax.spy(csc_matrix(L), precision = 1e-5, marker = '.', markersize = 2.0)\n",
      "\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "op = RealSparseMatrixOperator(B)\n",
      "eigen_solver = LanczosEigenSolver(op)\n",
      "\n",
      "# computing log-det estimates using probing sampler\n",
      "probing_sampler = ProbingSampler(op)\n",
      "cgm.set_iteration_limit(500)\n",
      "\n",
      "op_func = LogRationalApproximationCGM(op, engine, eigen_solver, cgm, 1E-5)\n",
      "log_det_estimator = LogDetEstimator(probing_sampler, op_func, engine)\n",
      "num_probing_estimates = 100\n",
      "probing_estimates = log_det_estimator.sample(num_probing_estimates)\n",
      "\n",
      "# computing log-det estimates using Gaussian sampler\n",
      "from modshogun import NormalSampler, Statistics\n",
      "\n",
      "num_colors = probing_sampler.get_num_samples()\n",
      "normal_sampler = NormalSampler(op.get_dimension())\n",
      "log_det_estimator = LogDetEstimator(normal_sampler, op_func, engine)\n",
      "num_normal_estimates = num_probing_estimates * num_colors\n",
      "normal_estimates = log_det_estimator.sample(num_normal_estimates)\n",
      "\n",
      "# average in groups of n_effective_samples\n",
      "effective_estimates_normal = zeros(num_probing_estimates)\n",
      "for i in range(num_probing_estimates):\n",
      "    idx = i * num_colors\n",
      "    effective_estimates_normal[i] = mean(normal_estimates[idx:(idx + num_colors)])\n",
      "\n",
      "actual_logdet = Statistics.log_det(B)\n",
      "print 'Actual log(det(B)):', actual_logdet\n",
      "print 'Estimated log(det(B)) using probing sampler:', mean(probing_estimates)\n",
      "print 'Estimated log(det(B)) using Gaussian sampler:', mean(effective_estimates_normal)\n",
      "print 'Variance using probing sampler:',var(probing_estimates)\n",
      "print 'Variance using Gaussian sampler:',var(effective_estimates_normal)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fig = plt.figure(figsize=(15, 4))\n",
      "ax = fig.add_subplot(1,3,1)\n",
      "ax.set_title('Probing sampler')\n",
      "ax.plot(cumsum(probing_estimates)/(arange(len(probing_estimates))+1))\n",
      "ax.plot([0,len(probing_estimates)], [actual_logdet, actual_logdet])\n",
      "ax.legend([\"Probing\", \"True\"])\n",
      "\n",
      "ax = fig.add_subplot(1,3,2)\n",
      "ax.set_title('Gaussian sampler')\n",
      "ax.plot(cumsum(effective_estimates_normal)/(arange(len(effective_estimates_normal))+1))\n",
      "ax.plot([0,len(probing_estimates)], [actual_logdet, actual_logdet])\n",
      "ax.legend([\"Gaussian\", \"True\"])\n",
      "\n",
      "ax = fig.add_subplot(1,3,3)\n",
      "ax.hist(probing_estimates)\n",
      "ax.hist(effective_estimates_normal)\n",
      "ax.plot([actual_logdet, actual_logdet], [0,len(probing_estimates)], linewidth=3)\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h2>A motivational example - likelihood of the Ozone dataset</h2>\n",
      "<p>In <a href=\"http://arxiv.org/abs/1306.4032\">Girolami et. al. (2013)</a>, an interesting scenario is discussed where the log-likelihood of a model involving large spatial dataset is considered. The data, collected by a satellite consists of $N=173,405$ ozone measurements around the globe. The data is modelled using three stage hierarchical way -\n",
      "$$y_{i}|\\mathbf{x},\\kappa,\\tau\\sim\\mathcal{N}(\\mathbf{Ax},\\tau^{\u22121}\\mathbf{I})$$\n",
      "$$\\mathbf{x}|\\kappa\\sim\\mathcal{N}(\\mathbf{0}, \\mathbf{Q}(\\kappa))$$\n",
      "$$\\kappa\\sim\\log_{2}\\mathcal{N}(0, 100), \\tau\\sim\\log_{2}\\mathcal{N}(0, 100)$$\n",
      "Where the precision matrix, $\\mathbf{Q}$, of a Matern SPDE model, defined on a fixed traingulation of the globe, is sparse and the parameter $\\kappa$ controls for the range at which correlations in the field are effectively zero (see Girolami et. al. for details). The log-likelihood estiamate of the posterior using this model is\n",
      "$$2\\mathcal{L}=2\\log \\pi(\\mathbf{y}|\\kappa,\\tau)=C+\\log(\\text{det}(\\mathbf{Q}(\\kappa)))+N\\log(\\tau)\u2212\\log(\\text{det}(\\mathbf{Q}(\\kappa)+\\tau \\mathbf{A}^{T}\\mathbf{A}))\u2212 \\tau\\mathbf{y}^{T}\\mathbf{y}+\\tau^{2}\\mathbf{y}^{T}\\mathbf{A}(\\mathbf{Q}(\\kappa)+\\tau\\mathbf{A}^{T}\\mathbf{A})^{\u22121}\\mathbf{A}^{T}\\mathbf{y}$$\n",
      "In the expression, we have two terms involving log-determinant of large sparse matrices. The rational approximation approach described in the previous section can readily be applicable to estimate the log-likelihood. The following computation shows the usage of Shogun's log-determinant estimator for estimating this likelihood (code has been adapted from an open source library, <a href=\"https://github.com/karlnapf/ozone-roulette.git\">ozone-roulette</a>, written by Heiko Strathmann, one of the authors of the original paper).\n",
      "\n",
      "<b>Please note that we again added a ridge along the diagonal for faster execution of this example. Since the original matrix is badly conditioned, one needs to set the iteration limits very high for both the Eigen solver and the linear solver in absense of precondioning.</b>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.io import loadmat\n",
      "\n",
      "def get_Q_y_A(kappa):\n",
      "    # read the ozone data and create the matrix Q\n",
      "    ozone = loadmat('../../../data/logdet/ozone_data.mat')\n",
      "    GiCG = ozone[\"GiCG\"]\n",
      "    G = ozone[\"G\"]\n",
      "    C0 = ozone[\"C0\"]\n",
      "    kappa = 13.1\n",
      "    Q = GiCG + 2 * (kappa ** 2) * G + (kappa ** 4) * C0\n",
      "    # also, added a ridge here\n",
      "    Q = Q + eye(Q.shape[0], Q.shape[1]) * 10000.0\n",
      "    \n",
      "    plt.spy(Q, precision = 1e-5, marker = '.', markersize = 1.0)\n",
      "    plt.show()\n",
      "    \n",
      "    # read y and A\n",
      "    y = ozone[\"y_ozone\"]\n",
      "    A = ozone[\"A\"]\n",
      "    return Q, y, A\n",
      "    \n",
      "def log_det(A):\n",
      "    op = RealSparseMatrixOperator(A)\n",
      "    engine = SerialComputationEngine()\n",
      "    eigen_solver = LanczosEigenSolver(op)\n",
      "    probing_sampler = ProbingSampler(op)\n",
      "    cgm = CGMShiftedFamilySolver()\n",
      "    cgm.set_iteration_limit(100)\n",
      "    op_func = LogRationalApproximationCGM(op, engine, eigen_solver, cgm, 1E-5)\n",
      "    log_det_estimator = LogDetEstimator(probing_sampler, op_func, engine)\n",
      "    num_estimates = 1\n",
      "    return mean(log_det_estimator.sample(num_estimates))\n",
      "\n",
      "def log_likelihood(tau, kappa):\n",
      "    Q, y, A = get_Q_y_A(kappa)\n",
      "    n = len(y);\n",
      "    AtA = A.T.dot(A)\n",
      "    M = Q + tau * AtA;\n",
      "\n",
      "    # Computing log-determinants\")\n",
      "    logdet1 = log_det(Q)\n",
      "    logdet2 = log_det(M)\n",
      "        \n",
      "    first = 0.5 * logdet1 + 0.5 * n * log(tau) - 0.5 * logdet2\n",
      "    \n",
      "    # computing the rest of the likelihood\n",
      "    second_a = -0.5 * tau * (y.T.dot(y))\n",
      "    \n",
      "    second_b = array(A.T.dot(y))\n",
      "    from scipy.sparse.linalg import spsolve\n",
      "    second_b = spsolve(M, second_b)\n",
      "    second_b = A.dot(second_b)\n",
      "    second_b = y.T.dot(second_b)\n",
      "    second_b = 0.5 * (tau ** 2) * second_b\n",
      "        \n",
      "    log_det_part = first\n",
      "    quadratic_part = second_a + second_b\n",
      "    const_part = -0.5 * n * log(2 * pi)\n",
      "      \n",
      "    log_marignal_lik = const_part + log_det_part + quadratic_part\n",
      "\n",
      "    return log_marignal_lik\n",
      "\n",
      "L = log_likelihood(1.0, 15.0)\n",
      "print 'Log-likelihood estimate:', L"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h2>Useful components</h2>\n",
      "<p>As a part of the implementation of log-determinant estimator, a number of classes have been developed, which may come useful for several other occassions as well."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>1. <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLinearOperator.html\">Linear Operators</a></h3>\n",
      "All the linear solvers and Eigen solvers work with linear operators. Both real valued and complex valued operators are supported for dense/sparse matrix linear operators."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import RealSparseMatrixOperator, ComplexDenseMatrixOperator\n",
      "\n",
      "dim = 5\n",
      "random.seed(10)\n",
      "# create a random valued sparse matrix linear operator\n",
      "A = csc_matrix(random.randn(dim, dim))\n",
      "op = RealSparseMatrixOperator(A)\n",
      "\n",
      "# creating a random vector\n",
      "random.seed(1)\n",
      "b = array(random.randn(dim))\n",
      "v = op.apply(b)\n",
      "print 'A.apply(b)=',v\n",
      "\n",
      "# create a dense matrix linear operator\n",
      "B = array(random.randn(dim, dim)).astype(complex)\n",
      "op = ComplexDenseMatrixOperator(B)\n",
      "print 'Dimension:', op.get_dimension()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>2. <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLinearSolver.html\">Linear Solvers</a></h3>\n",
      "<p> Conjugate Gradient based iterative solvers, that construct the Krylov subspace in their iteration by computing matrix-vector products are most useful for solving sparse linear systems. Here is an overview of CG based solvers that are currently available in Shogun.</p>\n",
      "<h4> <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CConjugateGradientSolver.html\">Conjugate Gradient Solver</a></h4>\n",
      "This solver solves for system $\\mathbf{Qx}=\\mathbf{y}$, where $\\mathbf{Q}$ is real-valued spd linear operator (e.g. dense/sparse matrix operator), and $\\mathbf{y}$ is real vector."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.sparse import csc_matrix\n",
      "from scipy.sparse import identity\n",
      "from modshogun import ConjugateGradientSolver\n",
      "\n",
      "# creating a random spd matrix\n",
      "dim = 5\n",
      "random.seed(10)\n",
      "m = csc_matrix(random.randn(dim, dim))\n",
      "a = m.transpose() * m + identity(dim)\n",
      "Q = RealSparseMatrixOperator(a)\n",
      "\n",
      "# creating a random vector\n",
      "y = array(random.randn(dim))\n",
      "\n",
      "# solve the system Qx=y\n",
      "# the argument is set as True to gather convergence statistics (default is False)\n",
      "cg = ConjugateGradientSolver(True)\n",
      "cg.set_iteration_limit(20)\n",
      "x = cg.solve(Q,y)\n",
      "\n",
      "print 'x:',x\n",
      "\n",
      "# verifying the result\n",
      "print 'y:', y\n",
      "print 'Qx:', Q.apply(x)\n",
      "\n",
      "residuals = cg.get_residuals()\n",
      "plt.plot(residuals)\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h4><a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CConjugateOrthogonalCGSolver.html\">Conjugate Orthogonal CG Solver</a></h4>\n",
      "Solves for systems $\\mathbf{Qx}=\\mathbf{z}$, where $\\mathbf{Q}$ is symmetric but non-Hermitian (i.e. having complex entries in its diagonal) and $\\mathbf{z}$ is real valued vector."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import ComplexSparseMatrixOperator\n",
      "from modshogun import ConjugateOrthogonalCGSolver\n",
      "\n",
      "# creating a random spd matrix\n",
      "dim = 5\n",
      "random.seed(10)\n",
      "m = csc_matrix(random.randn(dim, dim))\n",
      "a = m.transpose() * m + identity(dim)\n",
      "a = a.astype(complex)\n",
      "\n",
      "# adding a complex entry along the diagonal\n",
      "for i in range(0, dim):\n",
      "    a[i,i] += complex(random.randn(), random.randn())\n",
      "Q = ComplexSparseMatrixOperator(a)\n",
      "\n",
      "z = array(random.randn(dim))\n",
      "# solve for the system Qx=z\n",
      "cocg = ConjugateOrthogonalCGSolver(True)\n",
      "cocg.set_iteration_limit(20)\n",
      "x = cocg.solve(Q, z)\n",
      "\n",
      "print 'x:',x\n",
      "\n",
      "# verifying the result\n",
      "print 'z:',z\n",
      "print 'Qx:',real(Q.apply(x))\n",
      "\n",
      "residuals = cocg.get_residuals()\n",
      "plt.plot(residuals)\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h4><a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CCGMShiftedFamilySolver.html\">CG-M Shifted Family Solver</a></h4>\n",
      "Solves for systems with real valued spd matrices with complex shifts. For using it with log-det, an option to specify the weight of each solution is also there. The solve_shifted_weighted method returns $\\sum\\alpha_{l}\\mathbf{x}_{l}$ where $\\mathbf{x}_{l}=(\\mathbf{A}+\\sigma_{l}\\mathbf{I})^{-1}\\mathbf{y}$, $\\sigma,\\alpha\\in\\mathbb{C}$, $\\mathbf{y}\\in\\mathbb{R}$."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import CGMShiftedFamilySolver\n",
      "\n",
      "cgm = CGMShiftedFamilySolver()\n",
      "\n",
      "# creating a random spd matrix\n",
      "dim = 5\n",
      "random.seed(10)\n",
      "m = csc_matrix(random.randn(dim, dim))\n",
      "a = m.transpose() * m + identity(dim)\n",
      "Q = RealSparseMatrixOperator(a)\n",
      "\n",
      "# creating a random vector\n",
      "v = array(random.randn(dim))\n",
      "\n",
      "# number of shifts (will be equal to the number of contour points)\n",
      "num_shifts = 3;\n",
      "\n",
      "# generating some random shifts\n",
      "shifts = []\n",
      "for i in range(0, num_shifts):\n",
      "    shifts.append(complex(random.randn(), random.randn()))\n",
      "sigma = array(shifts)\n",
      "print 'Shifts:', sigma\n",
      "\n",
      "# generating some random weights\n",
      "weights = []\n",
      "for i in range(0, num_shifts):\n",
      "    weights.append(complex(random.randn(), random.randn()))\n",
      "alpha = array(weights)\n",
      "print 'Weights:',alpha\n",
      "\n",
      "# solve for the systems\n",
      "cgm = CGMShiftedFamilySolver(True)\n",
      "cgm.set_iteration_limit(20)\n",
      "x = cgm.solve_shifted_weighted(Q, v, sigma, alpha)\n",
      "\n",
      "print 'x:',x\n",
      "\n",
      "residuals = cgm.get_residuals()\n",
      "plt.plot(residuals)\n",
      "plt.show()\n",
      "\n",
      "# verifying the result with cocg\n",
      "x_s = array([0+0j] * dim)\n",
      "for i in range(0, num_shifts):\n",
      "    a_s = a.astype(complex)\n",
      "    for j in range(0, dim):\n",
      "        # moving the complex shift inside the operator\n",
      "        a_s[j,j] += sigma[i]\n",
      "    Q_s = ComplexSparseMatrixOperator(a_s)\n",
      "    # multiplying the result with weight\n",
      "    x_s += alpha[i] * cocg.solve(Q_s, v)\n",
      "print 'x\\':', x_s"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Apart from iterative solvers, a few more triangular solvers are added.\n",
      "<h4><a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDirectSparseLinearSolver.html\">Direct Sparse Linear Solver</a></h4>\n",
      "This uses sparse Cholesky to solve for linear systems $\\mathbf{Qx}=\\mathbf{y}$, where $\\mathbf{Q}$ is real-valued spd linear operator (e.g. dense/sparse matrix operator), and $\\mathbf{y}$ is real vector."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import DirectSparseLinearSolver\n",
      "\n",
      "# creating a random spd matrix\n",
      "dim = 5\n",
      "random.seed(10)\n",
      "m = csc_matrix(random.randn(dim, dim))\n",
      "a = m.transpose() * m + identity(dim)\n",
      "Q = RealSparseMatrixOperator(a)\n",
      "\n",
      "# creating a random vector\n",
      "y = array(random.randn(dim))\n",
      "\n",
      "# solve the system Qx=y\n",
      "chol = DirectSparseLinearSolver()\n",
      "x = chol.solve(Q,y)\n",
      "\n",
      "print 'x:',x\n",
      "\n",
      "# verifying the result\n",
      "print 'y:', y\n",
      "print 'Qx:', Q.apply(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h4><a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDirectLinearSolverComplex.html\">Direct Linear Solver for Complex</a></h4>\n",
      "This solves linear systems $\\mathbf{Qx}=\\mathbf{y}$, where $\\mathbf{Q}$ is complex-valued dense matrix linear operator, and $\\mathbf{y}$ is real vector."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from modshogun import DirectLinearSolverComplex\n",
      "\n",
      "# creating a random spd matrix\n",
      "dim = 5\n",
      "random.seed(10)\n",
      "m = array(random.randn(dim, dim))\n",
      "a = m.transpose() * m + identity(dim)\n",
      "a = a.astype(complex)\n",
      "\n",
      "# adding a complex entry along the diagonal\n",
      "for i in range(0, dim):\n",
      "    a[i,i] += complex(random.randn(), random.randn())\n",
      "Q = ComplexDenseMatrixOperator(a)\n",
      "\n",
      "z = array(random.randn(dim))\n",
      "# solve for the system Qx=z\n",
      "solver = DirectLinearSolverComplex()\n",
      "x = solver.solve(Q, z)\n",
      "\n",
      "print 'x:',x\n",
      "\n",
      "# verifying the result\n",
      "print 'z:',z\n",
      "print 'Qx:',real(Q.apply(x))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<h3>References</h3>\n",
      "<ol>\n",
      "<li> Erlend Aune, Daniel P. Simpson, Jo Eidsvik, <i>Parameter estimation in high dimensional Gaussian distributions</i>. Springer Statistics and Computing, December 2012.</li>\n",
      "<li> Nicholas Hale, Nicholas J. Higham and Lloyd N. Trefethen, <i>Computing $A^{\\alpha}$, $\\log(A)$ and Related Matrix Functions by Contour Integrals</i>, MIMS EPrint: 2007.103</li>\n",
      "<li> H.A. van der Vorst, <i>A Petrov-Galerkin Type Method for Solving $\\mathbf{Ax}=\\mathbf{b}$ Where $\\mathbf{A}$ Is Symmetric Complex</i>, IEEE TRANSACTIONS ON MAGNETICS, VOL. 26, NO. 2, MARCH 1990</li>\n",
      "<li> B. Jegerlehner, <i>Krylov space solvers for shifted linear systems</i>, HEP-LAT heplat/9612014, 1996</li>\n",
      "<li> Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade, <i>Playing Russian Roulette with Intractable Likelihoods</i>,arXiv:1306.4032 June 2013</li>\n"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}
